50 research outputs found

    Clinically driven semi-supervised class discovery in gene expression data

    Get PDF
    Abstract Motivation: Unsupervised class discovery in gene expression data relies on the statistical signals in the data to exclusively drive the results. It is often the case, however, that one is interested in constraining the search space to respect certain biological prior knowledge while still allowing a flexible search within these boundaries. Results: We develop an approach to semi-supervised class discovery. One component of our approach uses clinical sample information to constrain the search space and guide the class discovery process to yield biologically relevant partitions. A second component consists of using known biological annotation of genes to drive the search, seeking partitions that manifest strong differential expression in specific sets of genes. We develop efficient algorithmics for these tasks, implementing both approaches and combinations thereof. We show that our method is robust enough to detect known clinical parameters in accordance with expected clinical values. We also use our method to elucidate cardiovascular disease (CVD) putative risk factors. Availability: MonoClaD (Monotone Class Discovery). See http://bioinfo.cs.technion.ac.il/people/zohar/MonoClad/ Supplementary information: Supplementary data is available at http://bioinfo.cs.technion.ac.il/people/zohar/MonoClad/software.html Contact: [email protected]

    Novel Rank-Based Statistical Methods Reveal MicroRNAs with Differential Expression in Multiple Cancer Types

    Get PDF
    BACKGROUND:MicroRNAs (miRNAs) regulate target genes at the post-transcriptional level and play important roles in cancer pathogenesis and development. Variation amongst individuals is a significant confounding factor in miRNA (or other) expression studies. The true character of biologically or clinically meaningful differential expression can be obscured by inter-patient variation. In this study we aim to identify miRNAs with consistent differential expression in multiple tumor types using a novel data analysis approach. METHODS:Using microarrays we profiled the expression of more than 700 miRNAs in 28 matched tumor/normal samples from 8 different tumor types (breast, colon, liver, lung, lymphoma, ovary, prostate and testis). This set is unique in putting emphasis on minimizing tissue type and patient related variability using normal and tumor samples from the same patient. We develop scores for comparing miRNA expression in the above matched sample data based on a rigorous characterization of the distribution of order statistics over a discrete state set, including exact p-values. Specifically, we compute a Rank Consistency Score (RCoS) for every miRNA measured in our data. Our methods are also applicable in various other contexts. We compare our methods, as applied to matched samples, to paired t-test and to the Wilcoxon Signed Rank test. RESULTS:We identify consistent (across the cancer types measured) differentially expressed miRNAs. 41 miRNAs are under-expressed in cancer compared to normal, at FDR (False Discovery Rate) of 0.05 and 17 are over-expressed at the same FDR level. Differentially expressed miRNAs include known oncomiRs (e.g miR-96) as well as miRNAs that were not previously universally associated with cancer. Specific examples include miR-133b and miR-486-5p, which are consistently down regulated and mir-629* which is consistently up regulated in cancer, in the context of our cohort. Data is available in GEO. Software is available at: http://bioinfo.cs.technion.ac.il/people/zohar/RCoS

    EXPANDER – an integrative program suite for microarray data analysis

    Get PDF
    BACKGROUND: Gene expression microarrays are a prominent experimental tool in functional genomics which has opened the opportunity for gaining global, systems-level understanding of transcriptional networks. Experiments that apply this technology typically generate overwhelming volumes of data, unprecedented in biological research. Therefore the task of mining meaningful biological knowledge out of the raw data is a major challenge in bioinformatics. Of special need are integrative packages that provide biologist users with advanced but yet easy to use, set of algorithms, together covering the whole range of steps in microarray data analysis. RESULTS: Here we present the EXPANDER 2.0 (EXPression ANalyzer and DisplayER) software package. EXPANDER 2.0 is an integrative package for the analysis of gene expression data, designed as a 'one-stop shop' tool that implements various data analysis algorithms ranging from the initial steps of normalization and filtering, through clustering and biclustering, to high-level functional enrichment analysis that points to biological processes that are active in the examined conditions, and to promoter cis-regulatory elements analysis that elucidates transcription factors that control the observed transcriptional response. EXPANDER is available with pre-compiled functional Gene Ontology (GO) and promoter sequence-derived data files for yeast, worm, fly, rat, mouse and human, supporting high-level analysis applied to data obtained from these six organisms. CONCLUSION: EXPANDER integrated capabilities and its built-in support of multiple organisms make it a very powerful tool for analysis of microarray data. The package is freely available for academic users a

    GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results.</p> <p>Results</p> <p><it>GOrilla </it>is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression). <it>GOrilla </it>employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the <it>top </it>of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, <it>GOrilla </it>computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms.</p> <p>Conclusion</p> <p><it>GOrilla </it>is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. <it>GOrilla</it>'s unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. <it>GOrilla </it>is publicly available at: <url>http://cbl-gorilla.cs.technion.ac.il</url></p

    Small Deletion Variants Have Stable Breakpoints Commonly Associated with Alu Elements

    Get PDF
    Copy number variants (CNVs) contribute significantly to human genomic variation, with over 5000 loci reported, covering more than 18% of the euchromatic human genome. Little is known, however, about the origin and stability of variants of different size and complexity. We investigated the breakpoints of 20 small, common deletions, representing a subset of those originally identified by array CGH, using Agilent microarrays, in 50 healthy French Caucasian subjects. By sequencing PCR products amplified using primers designed to span the deleted regions, we determined the exact size and genomic position of the deletions in all affected samples. For each deletion studied, all individuals carrying the deletion share identical upstream and downstream breakpoints at the sequence level, suggesting that the deletion event occurred just once and later became common in the population. This is supported by linkage disequilibrium (LD) analysis, which has revealed that most of the deletions studied are in moderate to strong LD with surrounding SNPs, and have conserved long-range haplotypes. Analysis of the sequences flanking the deletion breakpoints revealed an enrichment of microhomology at the breakpoint junctions. More significantly, we found an enrichment of Alu repeat elements, the overwhelming majority of which intersected deletion breakpoints at their poly-A tails. We found no enrichment of LINE elements or segmental duplications, in contrast to other reports. Sequence analysis revealed enrichment of a conserved motif in the sequences surrounding the deletion breakpoints, although whether this motif has any mechanistic role in the formation of some deletions has yet to be determined. Considered together with existing information on more complex inherited variant regions, and reports of de novo variants associated with autism, these data support the presence of different subgroups of CNV in the genome which may have originated through different mechanisms

    Global Methylation Patterns in Idiopathic Pulmonary Fibrosis

    Get PDF
    BACKGROUND: Idiopathic Pulmonary Fibrosis (IPF) is characterized by profound changes in the lung phenotype including excessive extracellular matrix deposition, myofibroblast foci, alveolar epithelial cell hyperplasia and extensive remodeling. The role of epigenetic changes in determining the lung phenotype in IPF is unknown. In this study we determine whether IPF lungs exhibit an altered global methylation profile.\ud \ud METHODOLOGY/PRINCIPAL FINDINGS: Immunoprecipitated methylated DNA from 12 IPF lungs, 10 lung adenocarcinomas and 10 normal histology lungs was hybridized to Agilent human CpG Islands Microarrays and data analysis was performed using BRB-Array Tools and DAVID Bioinformatics Resources software packages. Array results were validated using the EpiTYPER MassARRAY platform for 3 CpG islands. 625 CpG islands were differentially methylated between IPF and control lungs with an estimated False Discovery Rate less than 5%. The genes associated with the differentially methylated CpG islands are involved in regulation of apoptosis, morphogenesis and cellular biosynthetic processes. The expression of three genes (STK17B, STK3 and HIST1H2AH) with hypomethylated promoters was increased in IPF lungs. Comparison of IPF methylation patterns to lung cancer or control samples, revealed that IPF lungs display an intermediate methylation profile, partly similar to lung cancer and partly similar to control with 402 differentially methylated CpG islands overlapping between IPF and cancer. Despite their similarity to cancer, IPF lungs did not exhibit hypomethylation of long interspersed nuclear element 1 (LINE-1) retrotransposon while lung cancer samples did, suggesting that the global hypomethylation observed in cancer was not typical of IPF.\ud \ud CONCLUSIONS/SIGNIFICANCE: Our results provide evidence that epigenetic changes in IPF are widespread and potentially important. The partial similarity to cancer may signify similar pathogenetic mechanisms while the differences constitute IPF or cancer specific changes. Elucidating the role of these specific changes will potentially allow better understanding of the pathogenesis of IPF.\ud \u
    corecore